conventional model
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead
Balachandran, Vidhisha, Chen, Jingya, Chen, Lingjiao, Garg, Shivam, Joshi, Neel, Lara, Yash, Langford, John, Nushi, Besmira, Vineet, Vibhav, Wu, Yue, Yousefi, Safoora
Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for mathematical tasks, the broader impact of this approach on other tasks remains less clear. In this work, we investigate the benefits and limitations of scaling methods across nine state-of-the-art models and eight challenging tasks, including math and STEM reasoning, calendar planning, NP-hard problems, navigation, and spatial reasoning. We compare conventional models (e.g., GPT-4o) with models fine-tuned for inference-time scaling (e.g., o1) through evaluation protocols that involve repeated model calls, either independently or sequentially with feedback. These evaluations approximate lower and upper performance bounds and potential for future performance improvements for each model, whether through enhanced training or multi-model inference systems. Our extensive empirical analysis reveals that the advantages of inference-time scaling vary across tasks and diminish as problem complexity increases. In addition, simply using more tokens does not necessarily translate to higher accuracy in these challenging regimes. Results from multiple independent runs with conventional models using perfect verifiers show that, for some tasks, these models can achieve performance close to the average performance of today's most advanced reasoning models. However, for other tasks, a significant performance gap remains, even in very high scaling regimes. Encouragingly, all models demonstrate significant gains when inference is further scaled with perfect verifiers or strong feedback, suggesting ample potential for future improvements.
- Asia > Middle East > Palestine > Gaza Strip > Rafah Governorate > Rafah (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.68)
- Research Report > Promising Solution (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Feature Selection via Dynamic Graph-based Attention Block in MI-based EEG Signals
Han, Hyeon-Taek, Lee, Dae-Hyeok, Kwak, Heon-Gyu
Brain-computer interface (BCI) technology enables direct interaction between humans and computers by analyzing brain signals. Electroencephalogram (EEG) is one of the non-invasive tools used in BCI systems, providing high temporal resolution for real-time applications. However, EEG signals are often affected by a low signal-to-noise ratio, physiological artifacts, and individual variability, representing challenges in extracting distinct features. Also, motor imagery (MI)-based EEG signals could contain features with low correlation to MI characteristics, which might cause the weights of the deep model to become biased towards those features. To address these problems, we proposed the end-to-end deep preprocessing method that effectively enhances MI characteristics while attenuating features with low correlation to MI characteristics. The proposed method consisted of the temporal, spatial, graph, and similarity blocks to preprocess MI-based EEG signals, aiming to extract more discriminative features and improve the robustness. We evaluated the proposed method using the public dataset 2a of BCI Competition IV to compare the performances when integrating the proposed method into the conventional models, including the DeepConvNet, the M-ShallowConvNet, and the EEGNet. The experimental results showed that the proposed method could achieve the improved performances and lead to more clustered feature distributions of MI tasks. Hence, we demonstrated that our proposed method could enhance discriminative features related to MI characteristics.
Adaptive Motion Generation Using Uncertainty-Driven Foresight Prediction
Hiruma, Hyogo, Ito, Hiroshi, Ogata, Tetusya
Uncertainty of environments has long been a difficult characteristic to handle, when performing real-world robot tasks. This is because the uncertainty produces unexpected observations that cannot be covered by manual scripting. Learning based robot controlling methods are a promising approach for generating flexible motions against unknown situations, but still tend to suffer under uncertainty due to its deterministic nature. In order to adaptively perform the target task under such conditions, the robot control model must be able to accurately understand the possible uncertainty, and to exploratively derive the optimal action that minimizes such uncertainty. This paper extended an existing predictive learning based robot control method, which employ foresight prediction using dynamic internal simulation. The foresight module refines the model's hidden states by sampling multiple possible futures and replace with the one that led to the lower future uncertainty. The adaptiveness of the model was evaluated on a door opening task. The door can be opened either by pushing, pulling, or sliding, but robot cannot visually distinguish which way, and is required to adapt on the fly. The results showed that the proposed model adaptively diverged its motion through interaction with the door, whereas conventional methods failed to stably diverge. The models were analyzed on Lyapunov exponents of RNN hidden states which reflect the possible divergence at each time step during task execution. The result indicated that the foresight module biased the model to consider future consequences, which lead to embedding uncertainties at the policy of the robot controller, rather than the resultant observation. This is beneficial for implementing adaptive behaviors, which indices derivation of diverse motion during exploration.
Neuromimetic metaplasticity for adaptive continual learning
Cho, Suhee, Lee, Hyeonsu, Baek, Seungdae, Paik, Se-Bum
Conventional intelligent systems based on deep neural network (DNN) models encounter challenges in achieving human-like continual learning due to catastrophic forgetting. Here, we propose a metaplasticity model inspired by human working memory, enabling DNNs to perform catastrophic forgetting-free continual learning without any pre- or post-processing. A key aspect of our approach involves implementing distinct types of synapses from stable to flexible, and randomly intermixing them to train synaptic connections with different degrees of flexibility. This strategy allowed the network to successfully learn a continuous stream of information, even under unexpected changes in input length. The model achieved a balanced tradeoff between memory capacity and performance without requiring additional training or structural modifications, dynamically allocating memory resources to retain both old and new information. Furthermore, the model demonstrated robustness against data poisoning attacks by selectively filtering out erroneous memories, leveraging the Hebb repetition effect to reinforce the retention of significant data.
- North America > United States (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Germany (0.04)
- (2 more...)
Decoding EEG-based Workload Levels Using Spatio-temporal Features Under Flight Environment
Lee, Dae-Hyeok, Kim, Sung-Jin, Kim, Si-Hyun, Lee, Seong-Whan
The detection of pilots' mental states is important due to the potential for their abnormal mental states to result in catastrophic accidents. This study introduces the feasibility of employing deep learning techniques to classify different workload levels, specifically normal state, low workload, and high workload. To the best of our knowledge, this study is the first attempt to classify workload levels of pilots. Our approach involves the hybrid deep neural network that consists of five convolutional blocks and one long short-term memory block to extract the significant features from electroencephalography signals. Ten pilots participated in the experiment, which was conducted within the simulated flight environment. In contrast to four conventional models, our proposed model achieved a superior grand--average accuracy of 0.8613, surpassing other conventional models by at least 0.0597 in classifying workload levels across all participants. Our model not only successfully classified workload levels but also provided valuable feedback to the participants. Hence, we anticipate that our study will make the significant contributions to the advancement of autonomous flight and driving leveraging artificial intelligence technology in the future.
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > Kansas > Johnson County > Olathe (0.04)
- Europe > Germany (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Transportation > Air (0.89)
Investigating Continuous Learning in Spiking Neural Networks
In this paper, the use of third-generation machine learning, also known as spiking neural network architecture, for continuous learning was investigated and compared to conventional models. The experimentation was divided into three separate phases. The first phase focused on training the conventional models via transfer learning. The second phase trains a Nengo model from their library. Lastly, each conventional model is converted into a spiking neural network and trained. Initial results from phase 1 are inline with known knowledge about continuous learning within current machine learning literature. All models were able to correctly identify the current classes, but they would immediately see a sharp performance drop in previous classes due to catastrophic forgetting. However, the SNN models were able to retain some information about previous classes. Although many of the previous classes were still identified as the current trained classes, the output probabilities showed a higher than normal value to the actual class. This indicates that the SNN models do have potential to overcome catastrophic forgetting but much work is still needed.
- Asia > China > Sichuan Province > Chengdu (0.05)
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
- Europe > Italy (0.04)
- (3 more...)
On Expressivity and Trainability of Quadratic Networks
Fan, Feng-Lei, Li, Mengzhou, Wang, Fei, Lai, Rongjie, Wang, Ge
Inspired by the diversity of biological neurons, quadratic artificial neurons can play an important role in deep learning models. The type of quadratic neurons of our interest replaces the inner-product operation in the conventional neuron with a quadratic function. Despite promising results so far achieved by networks of quadratic neurons, there are important issues not well addressed. Theoretically, the superior expressivity of a quadratic network over either a conventional network or a conventional network via quadratic activation is not fully elucidated, which makes the use of quadratic networks not well grounded. Practically, although a quadratic network can be trained via generic backpropagation, it can be subject to a higher risk of collapse than the conventional counterpart. To address these issues, we first apply the spline theory and a measure from algebraic geometry to give two theorems that demonstrate better model expressivity of a quadratic network than the conventional counterpart with or without quadratic activation. Then, we propose an effective training strategy referred to as ReLinear to stabilize the training process of a quadratic network, thereby unleashing the full potential in its associated machine learning tasks. Comprehensive experiments on popular datasets are performed to support our findings and confirm the performance of quadratic deep learning. We have shared our code in \url{https://github.com/FengleiFan/ReLinear}.
- North America > United States > New York > Rensselaer County > Troy (0.04)
- North America > United States > New York > New York County > New York City (0.04)
What's Race Got to do with it? Predicting Youth Depression Across Racial Groups Using Machine and Deep Learning
Depression is a common yet serious mental disorder that affects millions of U.S. high schoolers every year. Still, accurate diagnosis and early detection remain significant challenges. In the field of public health, research shows that neural networks produce promising results in identifying other diseases such as cancer and HIV. This study proposes a similar approach, utilizing machine learning (ML) and artificial neural network (ANN) models to classify depression in a student. Additionally, the study highlights the differences in relevant factors for race subgroups and advocates the need for more extensive and diverse datasets. The models train on nationwide Youth Risk Behavior Surveillance System (YRBSS) survey data, in which the most relevant factors of depression are found with statistical analysis. The survey data is a structured dataset with 15000 entries including three race subsets each consisting of 900 entries. For classification, the research problem is modeled as a supervised learning binary classification problem. Factors relevant to depression for different racial subgroups are also identified. The ML and ANN models are trained on the entire dataset followed by different race subsets to classify whether an individual has depression. The ANN model achieves the highest F1 score of 82.90% while the best-performing machine learning model, support vector machines (SVM), achieves a score of 81.90%. This study reveals that different parameters are more valuable for modeling depression across diverse racial groups and furthers research regarding American youth depression.
- North America > United States > New York > Queens County > New York City (0.04)
- North America > United States > District of Columbia (0.04)
- Asia > Middle East > Iran > South Khorasan Province > Birjand (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)